This document has been created using Rmarkdown, for the example of the Movement Ecology Course proposed to the Center for Wildlife Studies. This document contains all needed annotations to reproduce the practice section of the class. Rmarkdown is useful for creating codes with annotations, for reproducibility, for writing reports including plots and results from R, for creating tutorials, as in this case.
Creating a list of the packages to load allows to install, then to load, all the packages at once, using the function sapply.
# create packages list
packages <- list("plyr","dplyr","magrittr","lubridate","sp","sf","mapview","ggplot2")
# install the packages
# sapply(packages, install.packages, character = TRUE)
# load the packages
sapply(packages, require, character = TRUE)
Load the data I have shared with you. This dataset contains GPS locations of 4 individual barren-ground caribou (Rangifer tarandus groenlendicus and granti) from a population in Canada. Barren-ground caribou is the migratory ecotype of caribou and travels hundreds of kilometers between their winter and summer ranges.
# change the path accordingly
caribou <- read.csv("C://Users/ohcourio/Documents/Applications/2022/Center for wildlife studies/CourseExample/caribou.csv")
Now that we loaded the data, we can see what it looks like. Let’s see the dimensions of the data frame, the name of the columns and the top rows.
# dimensions
dim(caribou)
## [1] 18074 6
# columns names
names(caribou)
## [1] "ID" "sex" "Time" "Year" "Lon" "Lat"
# top rows
head(caribou)
## ID sex Time Year Lon Lat
## 1 Dancer f 4/1/2002 0:00 2002 -133.8427 62.77028
## 2 Dancer f 4/1/2002 8:00 2002 -133.8394 62.76716
## 3 Dancer f 4/1/2002 16:00 2002 -133.8453 62.77767
## 4 Dancer f 4/2/2002 0:00 2002 -133.8643 62.80010
## 5 Dancer f 4/2/2002 8:00 2002 -133.8621 62.79918
## 6 Dancer f 4/2/2002 16:00 2002 -133.8625 62.80007
The data has 18074 rows and 6 columns: - The unique identifier of the animal - The sex of the animal - The Date and Time of each fix (i.e., GPS location) - The Longitude and Latitude of the fix
We can also look at the structure of the data (the class of each column of the data frame), using the str function.
# Look at the structure of the data
str(caribou)
## 'data.frame': 18074 obs. of 6 variables:
## $ ID : chr "Dancer" "Dancer" "Dancer" "Dancer" ...
## $ sex : chr "f" "f" "f" "f" ...
## $ Time: chr "4/1/2002 0:00" "4/1/2002 8:00" "4/1/2002 16:00" "4/2/2002 0:00" ...
## $ Year: int 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 ...
## $ Lon : num -134 -134 -134 -134 -134 ...
## $ Lat : num 62.8 62.8 62.8 62.8 62.8 ...
This function gives the dimensions of the data frame, the name and the class of the columns.
We see that the ID, sex and Time are characters. For an easy manipulation and visualization, it is easier if ID and sex are factors. The characteristic of movement data is that they are spatio-temporal. This means that they are time series and thus that each GPS location (fix) has an associated Date and Time. Since R is able to handle Date and Times as such, we will convert the Date and Time as Date-time using the as.POSIXct function.
The Time now looks like:
# check the time format
caribou$Time[1]
## [1] "4/1/2002 0:00"
When transforming the Time as a Date-time, we will need to specify the time zone, which is “utc” and the format in as.POSIXct. To know what format it is, by looking at the strptime help.
?strptime
In the Details, you can find the format to specify. In our case, the Time is in the format “month/day/year hour:minute” as follow: “m/d/yyyy h:mm”. According to strptime, the format is thus: “%m/%d/%Y %H:%M”.
plyr and magrittr are useful for rearranging data frames in one line of code instead of several lines. Below is how we would do without using plyr and magrittr.
# create a new data frame to not overwrite the raw data in case of error
caribou2 <- caribou
# set ID and sex as factors and Time as Date-time without using plyr and magrittr
caribou2$ID <- as.factor(caribou2$ID)
caribou2$sex <- as.factor(caribou2$sex)
caribou2$Time <- as.POSIXct(as.character(caribou2$Time), format="%m/%d/%Y %H:%M", tz = "utc")
head(caribou2)
## ID sex Time Year Lon Lat
## 1 Dancer f 2002-04-01 00:00:00 2002 -133.8427 62.77028
## 2 Dancer f 2002-04-01 08:00:00 2002 -133.8394 62.76716
## 3 Dancer f 2002-04-01 16:00:00 2002 -133.8453 62.77767
## 4 Dancer f 2002-04-02 00:00:00 2002 -133.8643 62.80010
## 5 Dancer f 2002-04-02 08:00:00 2002 -133.8621 62.79918
## 6 Dancer f 2002-04-02 16:00:00 2002 -133.8625 62.80007
When using magrittr and plyr, everything can be done in one line of code. The pipes “%>%” allow to specify that we are applying a function to the object before the pipes. The function mutate allows to change columns of a data frame all at once.
# create a new data frame to not overwrite the raw data in case of error and set ID and sex as factors and Time as Date-time in one line
caribou2 <- caribou %>% mutate(ID = as.factor(ID),
sex = as.factor(sex),
Time = as.POSIXct(Time, format = "%m/%d/%Y %H:%M", tz = "utc"))
head(caribou2)
## ID sex Time Year Lon Lat
## 1 Dancer f 2002-04-01 00:00:00 2002 -133.8427 62.77028
## 2 Dancer f 2002-04-01 08:00:00 2002 -133.8394 62.76716
## 3 Dancer f 2002-04-01 16:00:00 2002 -133.8453 62.77767
## 4 Dancer f 2002-04-02 00:00:00 2002 -133.8643 62.80010
## 5 Dancer f 2002-04-02 08:00:00 2002 -133.8621 62.79918
## 6 Dancer f 2002-04-02 16:00:00 2002 -133.8625 62.80007
Now we can look at the number of individuals, their sex and the monitoring.
# Individuals
levels(caribou2$ID)
## [1] "Comet" "Dancer" "Prancer" "Vixen"
# sex
levels(caribou2$sex)
## [1] "f" "m"
# number of observation per individual and sex
table(caribou2$ID, caribou2$sex)
##
## f m
## Comet 0 4360
## Dancer 8563 0
## Prancer 0 2631
## Vixen 2520 0
# monitoring
by(caribou2$Time, caribou2$ID, range)
## caribou2$ID: Comet
## [1] "2006-04-01 00:00:00 UTC" "2008-04-01 16:00:00 UTC"
## ------------------------------------------------------------
## caribou2$ID: Dancer
## [1] "2002-04-01 UTC" "2005-03-11 UTC"
## ------------------------------------------------------------
## caribou2$ID: Prancer
## [1] "2014-04-04 00:01:00 UTC" "2016-09-02 08:00:00 UTC"
## ------------------------------------------------------------
## caribou2$ID: Vixen
## [1] "2007-04-01 00:00:00 UTC" "2009-08-01 16:00:00 UTC"
We see that all individuals have several years of monitoring and that Comet and Vixen have overlapping monitoring between 2007-04-01 and 2008-04-01.
It is very important when manipulating spatial data and time series to ALWAYS order by Individual and Time.
caribou2 <- caribou2[order(caribou2$ID, caribou2$Time),]
Caribou is a migrating species, let’s look at the latitude of individuals through time, using ggplot.
ggplot(data = caribou2, aes(x=Time, y=Lat)) + geom_line(aes(color = ID)) + facet_wrap(~ID, scales="free")
To visualize movement data, we can plot the data using ggplot2 for example.
# Plot the GPS data using ggplot2
ggplot(data=caribou2, aes(x=Lon, y=Lat)) + geom_point()
We see all the GPS locations together, we should add some colour depending on the individual.
# Plot the GPS data using ggplot2, each color represents an individual
ggplot(data=caribou2, aes(x=Lon, y=Lat)) + geom_point(aes(color=ID))
It is better. We can also see the trajectories by plotting lines.
# Plot trajectories using ggplot2, each color represents an individual
ggplot(data=caribou2, aes(x=Lon, y=Lat)) + geom_path(aes(color=ID))
We can also see where individuals started and where they finished.
# first we add a vector of integers from the first to the last location for each individual
caribou2 <- caribou2 %>% group_by(ID) %>% mutate(timeSeq = seq(1,length(Time),1)) %>% ungroup %>% as.data.frame
head(caribou2)
## ID sex Time Year Lon Lat timeSeq
## 1 Comet m 2006-04-01 00:00:00 2006 -130.9916 63.97680 1
## 2 Comet m 2006-04-01 01:00:00 2006 -130.9920 63.97703 2
## 3 Comet m 2006-04-01 02:00:00 2006 -130.9928 63.98008 3
## 4 Comet m 2006-04-01 03:00:00 2006 -130.9918 63.98152 4
## 5 Comet m 2006-04-01 04:00:00 2006 -130.9900 63.98217 5
## 6 Comet m 2006-04-01 05:00:00 2006 -130.9901 63.98215 6
# verify it did it by individual
by(caribou2$timeSeq, caribou2$ID, range)
## caribou2$ID: Comet
## [1] 1 4360
## ------------------------------------------------------------
## caribou2$ID: Dancer
## [1] 1 8563
## ------------------------------------------------------------
## caribou2$ID: Prancer
## [1] 1 2631
## ------------------------------------------------------------
## caribou2$ID: Vixen
## [1] 1 2520
Take the example of Vixen
# Plot trajectory of Vixen, with the a gradient of color depending on the time series
ggplot(data=caribou2 %>% subset(ID == "Vixen"), aes(x=Lon, y=Lat)) + geom_path(aes(color=timeSeq))
We see where Vixen started her journey and where the collars dropped
off.
What would be very interesting now would be to see where they are (on a map). mapview is a very useful package which creates interactive maps. However, to use mapview, the data needs to be spatial data with a projection.
There are several ways to manipulate spatial objects. Here we are using simple features (or sf objects), which are object in the real world that are viewable on a computer thanks to the presence of a spatial geometry (coordinates of the object).
Lets transform our data frame into a simple feature, using the package sf. Before doing so, it is important to know what is the spatial reference system of the GPS locations or coordinate system! For the caribou data, the coordinates are referenced in the latest version of the World Geodetic System (WGS), which is WGS84. To know what your coordinate system is you can go to epsg.io, which references most of coordinate systems. Type World Geodetic System 84 and get the corresponding epsg or crs, which is 4326.
# transform data into simple feature
caribou.sf <- caribou2 %>% st_as_sf(coords=c("Lon","Lat"), crs = 4326)
Now that we have sf object, let’s see what it looks like.
# first rows of the sf object
head(caribou.sf)
## Simple feature collection with 6 features and 5 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -130.9928 ymin: 63.9768 xmax: -130.99 ymax: 63.98217
## Geodetic CRS: WGS 84
## ID sex Time Year timeSeq geometry
## 1 Comet m 2006-04-01 00:00:00 2006 1 POINT (-130.9916 63.9768)
## 2 Comet m 2006-04-01 01:00:00 2006 2 POINT (-130.992 63.97703)
## 3 Comet m 2006-04-01 02:00:00 2006 3 POINT (-130.9928 63.98008)
## 4 Comet m 2006-04-01 03:00:00 2006 4 POINT (-130.9918 63.98152)
## 5 Comet m 2006-04-01 04:00:00 2006 5 POINT (-130.99 63.98217)
## 6 Comet m 2006-04-01 05:00:00 2006 6 POINT (-130.9901 63.98215)
# structure of the sf object
str(caribou.sf)
## Classes 'sf' and 'data.frame': 18074 obs. of 6 variables:
## $ ID : Factor w/ 4 levels "Comet","Dancer",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ sex : Factor w/ 2 levels "f","m": 2 2 2 2 2 2 2 2 2 2 ...
## $ Time : POSIXct, format: "2006-04-01 00:00:00" "2006-04-01 01:00:00" ...
## $ Year : int 2006 2006 2006 2006 2006 2006 2006 2006 2006 2006 ...
## $ timeSeq : num 1 2 3 4 5 6 7 8 9 10 ...
## $ geometry:sfc_POINT of length 18074; first list element: 'XY' num -131 64
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA
## ..- attr(*, "names")= chr [1:5] "ID" "sex" "Time" "Year" ...
The object is both a simple feature (sf), containing 18074 features (the rows in our data frame), but not the same columns (fields). What happened is that the Lon,Lat coordinates have been used to create the geometry column (where on Earth is the feature). To get the coordinates from sf objects, we use the function st_coordinates.
# get coordinates from an sf object
head(st_coordinates(caribou.sf)) # for example st_coordinates(caribous.sf)[,1] will give the x or Longitude
## X Y
## 1 -130.9916 63.97680
## 2 -130.9920 63.97703
## 3 -130.9928 63.98008
## 4 -130.9918 63.98152
## 5 -130.9900 63.98217
## 6 -130.9901 63.98215
# add the coordinates to the sf object
caribou.sf <- caribou.sf %>% mutate(Lon = st_coordinates(.)[,1], Lat = st_coordinates(.)[,2])
head(caribou.sf)
## Simple feature collection with 6 features and 7 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -130.9928 ymin: 63.9768 xmax: -130.99 ymax: 63.98217
## Geodetic CRS: WGS 84
## ID sex Time Year timeSeq geometry
## 1 Comet m 2006-04-01 00:00:00 2006 1 POINT (-130.9916 63.9768)
## 2 Comet m 2006-04-01 01:00:00 2006 2 POINT (-130.992 63.97703)
## 3 Comet m 2006-04-01 02:00:00 2006 3 POINT (-130.9928 63.98008)
## 4 Comet m 2006-04-01 03:00:00 2006 4 POINT (-130.9918 63.98152)
## 5 Comet m 2006-04-01 04:00:00 2006 5 POINT (-130.99 63.98217)
## 6 Comet m 2006-04-01 05:00:00 2006 6 POINT (-130.9901 63.98215)
## Lon Lat
## 1 -130.9916 63.97680
## 2 -130.9920 63.97703
## 3 -130.9928 63.98008
## 4 -130.9918 63.98152
## 5 -130.9900 63.98217
## 6 -130.9901 63.98215
We can now visualize our data on a map using mapview.
# visualize sf object, adding different color for each individual
mapview(caribou.sf, zcol = "ID")